AITopics | discrete policy

Collaborating Authors

discrete policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Scalable Crawling Algorithm Utilizing Noisy Change-Indicating Signals

Busa-Fekete, Róbert, Zimmert, Julian, György, András, Qiu, Linhai, Sung, Tzu-Wei, Shen, Hao, Choi, Hyomin, Subramaniam, Sharmila, Xiao, Li

arXiv.org Machine LearningFeb-4-2025

Web refresh crawling is the problem of keeping a cache of web pages fresh, that is, having the most recent copy available when a page is requested, given a limited bandwidth available to the crawler. Under the assumption that the change and request events, resp., to each web page follow independent Poisson processes, the optimal scheduling policy was derived by Azar et al. 2018. In this paper, we study an extension of this problem where side information indicating content changes, such as various types of web pings, for example, signals from sitemaps, content delivery networks, etc., is available. Incorporating such side information into the crawling policy is challenging, because (i) the signals can be noisy with false positive events and with missing change events; and (ii) the crawler should achieve a fair performance over web pages regardless of the quality of the side information, which might differ from web page to web page. We propose a scalable crawling algorithm which (i) uses the noisy side information in an optimal way under mild assumptions; (ii) can be deployed without heavy centralized computation; (iii) is able to crawl web pages at a constant total rate without spikes in the total bandwidth usage over any time interval, and automatically adapt to the new optimal solution when the total bandwidth changes without centralized computation. Experiments clearly demonstrate the versatility of our approach.

artificial intelligence, ci signal, machine learning, (19 more...)

arXiv.org Machine Learning

2502.0243

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)

Add feedback

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Wu, Kun, Zhu, Yichen, Li, Jinming, Wen, Junjie, Liu, Ning, Xu, Zhiyuan, Qiu, Qinru, Tang, Jian

arXiv.org Artificial IntelligenceOct-26-2024

Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways, resulting in a multimodal action distribution for a single task. The complexity of action distribution escalates as the number of tasks increases. In this work, we propose \textbf{Discrete Policy}, a robot learning method for training universal agents capable of multi-task manipulation skills. Discrete Policy employs vector quantization to map action sequences into a discrete latent space, facilitating the learning of task-specific codes. These codes are then reconstructed into the action space conditioned on observations and language instruction. We evaluate our method on both simulation and multiple real-world embodiments, including both single-arm and bimanual robot settings. We demonstrate that our proposed Discrete Policy outperforms a well-established Diffusion Policy baseline and many state-of-the-art approaches, including ACT, Octo, and OpenVLA. For example, in a real-world multi-task training setting with five tasks, Discrete Policy achieves an average success rate that is 26\% higher than Diffusion Policy and 15\% higher than OpenVLA. As the number of tasks increases to 12, the performance gap between Discrete Policy and Diffusion Policy widens to 32.5\%, further showcasing the advantages of our approach. Our work empirically demonstrates that learning multi-task policies within the latent space is a vital step toward achieving general-purpose agents.

artificial intelligence, machine learning, preprint arxiv, (13 more...)

arXiv.org Artificial Intelligence

2409.18707

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL

Zhang, Yechen, Sun, Jian, Wang, Gang, Li, Zhuo, Chen, Wei

arXiv.org Artificial IntelligenceAug-20-2023

Discrete reinforcement learning (RL) algorithms have demonstrated exceptional performance in solving sequential decision tasks with discrete action spaces, such as Atari games. However, their effectiveness is hindered when applied to continuous control problems due to the challenge of dimensional explosion. In this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation. SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$-function. This novel approach enables SDPC to support two types of policies: decomposed actors that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed $Q$-networks that generate Boltzmann soft exploration policies, resulting in the Soft Decomposed-Critic Q (SDCQ) algorithm. Through extensive experiments, we demonstrate that our proposed approach outperforms state-of-the-art continuous RL algorithms in a variety of continuous control tasks, including Mujoco's Humanoid and Box2d's BipedalWalker. These empirical results validate the effectiveness of the SDPC architecture in addressing the challenges associated with continuous control.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2308.10203

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Discrete Action On-Policy Learning with Action-Value Critic

Yue, Yuguang, Tang, Yunhao, Yin, Mingzhang, Zhou, Mingyuan

arXiv.org Machine LearningFeb-21-2020

Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.

action space, algorithm, gradient, (11 more...)

arXiv.org Machine Learning

2002.03534

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Discretizing Continuous Action Space for On-Policy Optimization

Tang, Yunhao, Agrawal, Shipra

arXiv.org Artificial IntelligenceFeb-1-2019

The combinations of joint atomic actions, which quickly becomes explosion in the number of discrete actions can intractable when M increases. However, a simple fix be efficiently addressed by a policy with factorized is to represent the joint distribution over discrete actions as distribution across action dimensions. We factorized across dimensions, so that the joint policy is still show that the discrete policy achieves significant tractable. As prior works have applied such discretization performance gains with state-of-the-art on-policy method in practice (OpenAI, 2018; Jaśkowski et al., 2018), optimization algorithms (PPO, TRPO, ACKTR) we aim to carry out a systemic study of such straightforward especially on high-dimensional tasks with complex discretization method in simulated environments, and show dynamics. Additionally, we show that an ordinal how they improve upon on-policy optimization baselines.

artificial intelligence, discrete policy, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1901.105

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback